Chapter 4 Clustering - Definitions and Basic Algorithms

نویسنده

  • Sariel Har-Peled
چکیده

x Do not read this story; turn the page quickly. The story may upset you. Anyhow, you probably know it already. It is a very disturbing story. Everyone knows it. The glory and the crime of Commander Suzdal have been told in a thousand different ways. Don't let yourself realize that the story is the truth. It isn't. not at all. There's not a bit of truth to it. There is no such planet as Arachosia, no such people as klopts, no such world as Catland. These are all just imaginary, they didn't happen, forget about it, go away and read something else. In this chapter, we will initiate our discussion of clustering. Clustering is one of the most fundamental computational tasks, but frustratingly, one of the fuzziest. It can be stated informally as: " Given data, find interesting structure in the data. Go! " The fuzziness arise naturally from the requirement that it would be " interesting " , as this is not well defined and depends on human perception which is sometime impossible to quantify clearly. Similarly, what is " structure " is also open to debate. Nevertheless, clustering is inherent to many computational tasks like learning, searching and data-mining. Empirical study of clustering concentrates on trying various measures for the clustering, and trying out various algorithms and heuristics to compute these clusterings. See bibliographical notes for some relevant references. Here, we will concentrate on some well defined clustering tasks, including k-center clustering, k-median clustering, and k-means clustering, and some basic algorithms for these problems. A clustering problem is usually defined by a set of items, and a distance function defined between these items. While these items might be points in IR d and the distance function is just the regular x This work is licensed under the Creative Commons Attribution-Noncommercial 3.0 License. To view a copy of this license, visit

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improvement of density-based clustering algorithm using modifying the density definitions and input parameter

Clustering is one of the main tasks in data mining, which means grouping similar samples. In general, there is a wide variety of clustering algorithms. One of these categories is density-based clustering. Various algorithms have been proposed for this method; one of the most widely used algorithms called DBSCAN. DBSCAN can identify clusters of different shapes in the dataset and automatically i...

متن کامل

An Introduction to Multi-objective Evolutionary Algorithms and Their Applications

This chapter provides the basic concepts necessary to understand the rest of this book. The introductory material provided here includes some basic mathematical definitions related to multi-objective optimization, a brief description of the most representative multi-objective evolutionary algorithms in current use and some of the most representative work on performance measures used to validate...

متن کامل

Analysis of Privacy Preserving Distributed Data Mining Protocols

.............................................................................................ii Declaration ..........................................................................................iii Acknowledgement .................................................................................iv Basic definitions ................................................................................

متن کامل

Assessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories

In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...

متن کامل

Handbook of Cluster Analysis

Spectral clustering is a family of methods to find K clusters using the eigenvectors of a matrix. Typically, this matrix is derived from a set of pairwise similarities Sij between the points to be clustered. This task is called similarity based clustering, graph clustering, or clustering of diadic data. One remarkable advantage of spectral clustering is its ability to cluster “points” which are...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008